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Abstract 

Swarm intelligence has becoming a powerful technique in solving design and scheduling tasks. 
Metaheuristic algorithms are an integrated part of this paradigm, and particle swarm optimiza- 
tion is often viewed as an important landmark. The outstanding performance and efficiency of 
swarm-based algorithms inspired many new developments, though mathematical understanding 
of metaheuristics remains partly a mystery. In contrast to the classic deterministic algorithms, 
metaheuristics such as PSO always use some form of randomness, and such randomization now 
employs various techniques. This paper intends to review and analyze some of the convergence 
and efflciency associated with metaheuristics such as firefly algorithm, random walks, and Levy 
flights. We will discuss how these techniques are used and their implications for further research. 
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1 Swarm Intelligence 

The emerging trend of cognitive informatics is to use swarm intelligence to find solutions to chal- 
lenging design problems and real-world applications. A representative example is the development 
of particle swarm optimization(PSO), which can be considered as a classic landmark in using swarm 
intelligence or collective intelligence ^^^^ . Another excellent example is the ant colony optimiza- 
tion which mimics the collective behaviour of social insects . These algorithms are metaheuristic 
and swarm-based, though they may have some similarities with genetic algorithms but 
it is much simpler because they do not use mutation/crossover operators. Genetic algorithms are 
population-based, which was based on the evolution theory and survival of the fittest I-'^^I . Similarly, 
PSO is also population-based, but it was inspired by the swarm behaviour of fish and birds, and 
therefore falls into the different category from genetic algorithms. 

Swarm-based or population-based algorithms form the majority of nature-inspired metaheuristic 
algorithms with a wider range of applications ^"1. Population-based algorithms 

are not necessarily 'intelligent'. In contrast, the the trajectory-based algorithms such as simulated 
annealing l^-'^l may behave slightly 'intelligent' somehow. Even an algorithm is called swarm-based, 
but it is not really 'intelligent' in the normal sense. What we really mean is that the algorithm 
intends to mimic the swarm behaviour and work in an idealized manner, in the hope that it can 
provide better and more efficient search for finding tough optimization problems, and thus showing 
some sort of 'intelligence' comparing with the classical mechanistical algorithms. 

Since the appearance of swarm intelligence algorithms such as PSO in the 1990s, more than 
a dozen of new metaheuristic algorithms have been developed ^' and these algorithms 

have been applied to almost all areas of optimization, design, scheduling and planning, data mining, 
machine intelligence, and many others. Thousands of research papers and dozens of books have 
been published I^^' , 



Despite the rapid development of swarm-based metaheuristics, their mathematical analysis re- 
mains partly unsolved, and many open problems need urgent attention. This difficulty is largely due 
to the fact the interaction of various components in metaheuristic algorithms are highly nonlinear, 
complex, and stochastic. Studies have attempted to carry out convergence analysis using the theory 
and methods of dynamical systems, and some important results concerning PSO were obtained . 
However, for other metaheuristics such as firefly algorithms and ant colony optimization, it remains 
an active, challenging topic. On the other hand, even we have not proved or cannot prove their 
convergence, we still can compare their performance of various algorithms. This has indeed formed 
a majority of current research in algorithm development in the research community of optimization 
and machine intelligence [^"^ ■^^^ '^■'1 . 

Another important issue is the various randomization techniques used for modern metaheuristics, 
from simple randomization such as uniform distribution to random walks, and to more elaborate 
Levy flights 24, 33] _ j.-^^^^ 

is no unified approach to analyze these mathematically. In this paper, 
we intend to review the convergence of swarm-based algorithms including PSO and firefly algorithms. 
We also try to analyze the mathematical and statistical foundation for randomization techniques 
from simple random walks to Levy flights. As the result of the analysis, we will study the proper 
step size for randomization in metaheuristics. Finally, we will discuss the implication and further 
research topics in swarm intelligence and their applications. 



2 Particle Swarm Optimization 
2.1 Standard PSO 

Particle swarm optimization (PSO) was developed by Kennedy and Eberhart in 1995 I^*' based 
on the swarm behaviour such as fish and bird schooling in nature. Since then, PSO has generated 
much wider interests, and forms an exciting, ever-expanding research subject, called swarm intelli- 
gence. PSO has been applied to almost every area in optimization, computational intelligence, and 
design/scheduling applications. There are at least two dozens of PSO variants and most of them 
were developed by introducing or varying certain components I^' ^1 . Hybrid algorithms by combining 
PSO with other existing algorithms are also increasingly popular. 

In essence, PSO searches the design space of an optimization problem by adjusting the trajectories 
of individual agents, called particles, as the piecewise paths formed by positional vectors in a quasi- 
stochastic manner. The movement of a swarming particle consists of two major components: a 
stochastic component and a deterministic component. Each particle is attracted toward the position 
of the current global best g* and its own best location a3| in history, while at the same time it has 
a tendency to move randomly. 

Let Xi and Vi be the position vector and velocity for particle i, respectively. The new velocity 
vector is determined by the following formula 

vl'^^ =vl + aei © [g* - a;-] + /3e2 [a;* - a;*]. (1) 

where Cj and £2 are two random vectors, and each entry taking the values between and 1. The 
Hadamard product of two matrices u Q v is defined as the entrywise product, that is [it v]ij = 
UijVij. The parameters a and 13 are the learning parameters or acceleration constants, which can 
typically be taken as, say, a /3 sa 2. 

The initial locations of all particles should distribute relatively uniformly so that they can sample 
over most regions, which is especially important for multimodal problems. The initial velocity of a 
particle can be taken as zero, that is, t)*"" = 0. The new position can then be updated by 

xl^'=xl+vl''. (2) 

Although Vi can be any values, it is usually bounded in some range [I'min, ■Wmax]- 

There are many variants which extend the standard PSO algorithm, and the most noticeable 
improvement is probably to use inertia function 6{t) so that is replaced by 0{t)vl 

= 0v'^ + aei [g* - x^] + /3e2 [x* - x'^], (3) 



where 9 takes the values between and 1 W. In the simplest case, the inertia function can be taken 
as a constant, typically « 0.5 ~ 0.9. This is equivalent to introducing a virtual mass to stabilize 
the motion of the particles, and thus the algorithm is expected to converge more quickly. 



2.2 Accelerated PSO 

The standard particle swarm optimization uses both the current global best g* and the individual 
best X* . The reason of using the individual best is primarily to increase the diversity in the quality 
solutions, however, this diversity can be simulated using some randomness. Subsequently, there is 
no compelling reason for using the individual best, unless the optimization problem of interest is 
highly nonlinear and multimodal. 

A simplified version which could accelerate the convergence of the algorithm is to use the global 
best only [■^^1 . Thus, in the accelerated particle swarm optimization, the velocity vector is generated 
by a simpler formula 

vl+' =vl + a{e- 1/2) + P{g*-x^, (4) 

where e is a random variable with values from to 1. Here the shift 1/2 is purely out of convenience. 
We can also use a standard normal distribution ae„ where e„ is drawn from N{0, 1) to replace the 
second term. The update of the position is simply 

xl^'=xl+vl^\ (5) 

In order to increase the convergence even further, we can also write the update of the location in a 
single step 

xl+' = il-p)xl+pg* + aen. (6) 

This simpler version will give the same order of convergence. The typical values for this accelerated 
PSO are a « 0.1 ~ 0.4 and /3 « 0.1 ~ 0.7, though a « 0.2 and /3 « 0.5 can be taken as the initial 
values for most unimodal objective functions. It is worth pointing out that the parameters a and ^ 
should in general be related to the scales of the independent variables Xi and the search domain. 

A further improvement to the accelerated PSO is to reduce the randomness as iterations proceed. 
This means that we can use a monotonically decreasing function such as 

a = aoe-^\ (7) 



or 

a = ao7*, (0 < 7 < 1), (8) 

where ao ~ 0.5 ^ 1 is the initial value of the randomness parameter. Here t is the number of 
iterations or time steps. < 7 < 1 is a control parameter. This is similar to the geometric cooling 
schedule used in simulated annealing . 

As we will see that the convergence of the standard PSO can be guaranteed under appropriate 
conditions. The accelerated PSO can also have guaranteed convergence, which forms the basis for 
the convergence of the firefly algorithm algorithm to be introduced later. Interestingly, some chaotic 
behaviour may emerge under the right conditions, which may be used as advantage for efficient 
randomization. 



2.3 Convergence Analysis 

The first convergence analysis of PSO was carried out by Clerc and Kennedy in 2002 1^1 using the 
theory of dynamical systems. Mathematically, if we ignore the random factors, we can view the 
system formed by (1) and (2) as a dynamical system. If we focus on a single particle i and imagine 
that there is only one particle in this system, then the global best g* is the same as its current best 
X*. In this case, we have 

vl+'=vl+j{g*-xl), -f = a + /3, (9) 

and 

xl^'=xl + vl^\ (10) 



Considering the ID dynamical system for particle swarm optimization, wc can replace g* by a 
parameter constant p so that we can see if or not the particle of interest will converge towards p. By 
setting ut = p — x{t + 1) and using the notations for dynamical systems, we have a simple dynamical 
system 

Vt+i = Vt + -fut, Uf+i = -■Uf + (1 - 7)ut, (11) 

or 

The general solution of this dynamical system can be written as 

Ft =yoexp[At]. (13) 
The main behaviour of this system can be characterized by the eigenvalues A of A 



K. = l-l±^^^. (14) 

It can be seen clearly that 7 = 4 leads to a bifurcation. 

Following a straightforward analysis of this dynamical system, we can have three cases. For 
< 7 < 4, cyclic and/or quasi-cyclic trajectories exist. In this case, when randomness is gradually 
reduced, some convergence can be observed. For 7 > 4, non-cyclic behaviour can be expected 
and the distance from Yt to the center (0, 0) is monotonically increasing with t. In a special case 
7 = 4, some convergence behaviour can be observed. For detailed analysis, please refer to Clerc and 
Kennedy '^l. Since p is linked with the global best, as the iterations continue, it can be expected 
that all particles will aggregate towards the the global best. 



3 Firefly Algorithm 

Firefly Algorithm (FA) was developed by Xin-She Yang at Cambridge University in late 2007 and 
early 2008 [32- 34] ^ ^Yiidi was based on the flashing patterns and behaviour of fireflies. In essence, 
FA uses the following three idealized rules: 

• Fireflies are unisex so that one firefly will be attracted to other flreflies regardless of their sex; 

• The attractiveness is proportional to the brightness and they both decrease as their distance 
increases. Thus for any two flashing fireflies, the less brighter one will move towards the 
brighter one. If there is no brighter one than a particular firefiy, it will move randomly; 

• The brightness of a firefiy is determined by the landscape of the objective function. 

For a maximization problem, the brightness can simply be proportional to the value of the 
objective function. Other forms of brightness can be defined in a similar way to the fltness function 
in genetic algorithms. 

3.1 Variations of Attractiveness 

As both light intensity and attractiveness affect the movement of fireflies in the fircifly algorithm, 
we have to define their variations. For simplicity, we can always assume that the attractiveness 
of a firefiy is determined by its brightness which in turn is associated with the encoded objective 
function. In the simplest case for maximum optimization problems, the brightness / of a firefly at 
a particular location x can be chosen as I{x) oc f{x). However, the attractiveness (i is relative, it 
should be seen in the eyes of the beholder or judged by the other firefiies. Thus, it will vary with 
the distance Vij between firefiy i and firefiy j. 



In addition, light intensity decreases with the distance from its source, and hght is also absorbed 
in the media, so we should allow the attractiveness to vary with the degree of absorption. In the 
simplest form, the light intensity /(r) varies according to the inverse square law 

I{r) = ^, (15) 

where Is is the intensity at the source. In a medium with a fixed light absorption coefficient 7, the 
light intensity / varies with the distance r. That is 

/ = /oe-^^ (16) 

where Iq is the original light intensity. In order to avoid the singularity at r = in the expression 

Is/r'^, the combined effect of both the inverse square law and absorption can be approximated as 
the following Gaussian form /(r) = Ioe~'^^ . As a firefly's attractiveness is proportional to the light 
intensity seen by adjacent fireflies, we can now deflne the attractiveness ^ of a firefly by 

/3 = /3oe-^'-', (17) 

where /3o is the attractiveness at r = 0. In fact, equation (17) defines a characteristic distance 
r = over which the attractiveness changes significantly from /3o to j3oe~^ . 

The distance between any two fireflies i and j at Xi and Xj, respectively, is the Cartesian distance 



d 

^(aJj.fc - Xj^kY, (18) 
fe=i 

where is the fcth component of the spatial coordinate Xi of ith firefly. It is worth pointing 
out that the distance r defined above is not limited to the Euclidean distance. We can define 
other distance r in the n-dimensional hyperspace, depending on the type of problem of our interest. 
For example, for job scheduling problems, r can be defined as the time lag or time interval. For 
complicated networks such as the Internet and social networks, the distance r can be defined as 
the combination of the degree of local clustering and the average proximity of vertices. In fact, any 
measure that can effectively characterize the quantities of interest in the optimization problem can 
be used as the 'distance' r. 

The movement of a firefly i is attracted to another more attractive (brighter) firefly j is deter- 
mined by 

Xi = X., + Poer'^'^ii {xj - Xi) +a Ei, (19) 

where the second term is due to the attraction. The third term is randomization with a being the 
randomization parameter, and is a vector of random numbers drawn from a Gaussian distribution 
or uniform distribution. For most implementations, we can take /3o = 1 and a = 0{1). It is 
worth pointing out that (19) is a random walk biased towards the brighter fireflies. If /3o = 0, it 
becomes a simple random walk. Furthermore, the randomization term can easily be extended to 
other distributions such as Levy flights t^^' 

The parameter 7 now characterizes the variation of the attractiveness, and its value is crucially 
important in determining the speed of the convergence and how the FA algorithm behaves. In 
theory, 7 G [0, 00), but in practice, 7 = 0(1) is determined by the characteristic length F of the 
system to be optimized. Thus, for most applications, it typically varies from 10~^ to 10^. 

3.2 Asymptotic Limits 

The typical scale F should be associated with the scale concerned in our optimization problem. If F is 

the typical scale for a given optimization problem, for a very large number of fireflies k where k 
is the number of local optima, then the initial locations of these n fireflies should distribute relatively 
uniformly over the entire search space. As the iterations proceed, the fireflies would converge into 



Figure 1: The chaotic map of the iteration formula (26) in the firefly algorithm and the transition 
between from periodic/multiple states to chaos. 



all the local optima (including the global ones). By comparing the best solutions among all these 
optima, the global optima can easily be achieved. 

Two important limiting or asymptotic cases when 7 — > and 7 — >■ cxd. One limiting case is 
7 — > 0, the attractiveness is constant (3 — f3o and F — > 00, this is equivalent to saying that the light 
intensity does not decrease in an idealized sky. Thus, a flashing firefly can be seen anywhere in 
the domain. Thus, a single (usually global) optima can easily be reached. If we replace Xj by the 
current global best g^, then the Firefly Algorithm becomes the special case of accelerated particle 
swarm optimization (PSO) discussed earlier. Subsequently, the efhciency of this special case is the 
same as that of PSO. 

Another limiting case is 7 — > 00, leading to F and f3{r) d{r) which is the Dirac delta 
function, which means that the attractiveness is almost zero in the sight of other fireflies. This 
is equivalent to the case where the fireflies roam in a very thick foggy region randomly. No other 
fireflies can be seen, and each firefly roams in a completely random way. Therefore, this corresponds 
to the completely random search method. 

In general, firefly algorithm usually works between these two extremes, and it is thus possible 
to adjust the parameter 7 and a so that it can outperform both the random search and PSO. In 
fact, FA can find the global optima as well as the local optima simultaneously and effectively I'^^l . A 
further advantage of FA is that different fireflies will work almost independently, it is thus particular 
suitable for parallel implementation. It is even better than genetic algorithms and PSO because 
fireflies aggregate more closely around each optimum. It can be expected that the interactions 
between different subregions are minimal in parallel implementation. 

3.3 Convergence and Chaos 

We can carry out the convergence analysis for the firefly algorithm in a framework similar to Clerc 
and Kennedy's dynamical analysis. For simplicity, we start from the equation for firefiy motion 
without the randomness term 




(20) 



(21) 



where the distance can be given by the ^2-iiorm 



(22) 



In an even simpler 1-D case, we can set yt = g — x\, and we have 

yt+i =yt- Poe-^^ht- (23) 
Using the approximation of < jy^ <C 1, we have 

yt+i =yt- /3o(l - 7%' )2/t = Vtl^ - - IVt)]- (24) 

We can sec that 7 is a scahng parameter which only affects the scales/size of the firefly movement. 
In fact, we can let Ut = ^J^yt and rewrite the above the equation as 

ut+x=ut{\-h{\-u\)\, (25) 

or directly from (23) 

Ut+x = Ut\\-p^e-<\. (26) 

These equations can be analyzed easily using the same methodology for studying the well-known 
logistic map 

Ut = Xut{l - Ut). (27) 

The chaotic map is shown in Fig. 1, and the focus on the transition from periodic multiple states 

to chaotic behaviour is shown in the same figure. 

As we can see from Fig. 1 that convergence can be achieved for /3o < 2. There is a transition 
from periodic to chaos at /3o ~ 4. This may be surprising, as the aim of designing a metaheuristic 
algorithm is to try to find the optimal solution efficiently and accurately. However, chaotic behaviour 
is not necessarily a nuisance, in fact, we can use it to the advantage of the firefly algorithm. Simple 
chaotic characteristics from (27) can often be used as an efficient mixing technique for generating 
diverse solutions. Statistically, the logistic mapping (27) for A = 4 for the initial states in (0,1) 
corresponds a beta distribution 

when p = q = 1/2. Here T{z) is the Gamma function 



r(^) = 

Jo 



*dt. (29) 



In the case when 2; = n is an integer, we have r(n) = (n — 1)!. In addition, r(l/2) = y/n. From 
the algorithm implementation point of view, we can use higher attractiveness /3o during the early 
stage of iterations so that the fireflies can explore, even chaotically, the search space more effectively. 
As the search continues and convergence approaches, we can reduce the attractiveness j3o gradually, 
which may increase the overall efficiency of the algorithm. Obviously, more studies are highly needed 
to confirm this. 



4 Essential Components of Metaheuristics 

The efficiency of metaheuristic algorithms can be attributed to the fact that they imitate the best 
features in nature, especially the selection of the fittest in biological systems which have evolved by 
natural selection over millions of years. 

Metaheuristics can be considered as an efficient way to produce acceptable solutions by trial 
and error to a complex problem in a reasonably practical time. The complexity of the problem of 
interest makes it impossible to search every possible solution or combination, the aim is to find good 
feasible solution in an acceptable timcscalc. There is no guarantee that the best solutions can be 
found, and we even do not know whether an algorithm will work and why if it does work. The idea 
is to have an efficient but practical algorithm that will work most the time and is able to produce 



good quality solutions. Among the found quality solutions, it is expected some of them are nearly 
optimal, though there is no guarantee for such optimality. 

The main components of any metaheuristic algorithms are: intensification and diversification, 
or exploitation and exploration ^"^1 . Diversification means to generate diverse solutions so as to 
explore the search space on the global scale, while intensification means to focus on the search in a 
local region by exploiting the information that a current good solution is found in this region. This 
is in combination with with the selection of the best solutions. 

The selection of the best ensures that the solutions will converge to the optimality. On the other 
hand, the diversification via randomization avoids the solutions being trapped at local optima, while 
increases the diversity of the solutions. The good combination of these two major components will 
usually ensure that the global optimality is achievable. 

The fine balance between these two components is very important to the overall efficiency and 
performance of an algorithm. Too little exploration and too much exploitation could cause the 
system to be trapped in local optima, which makes it very difficult or even impossible to find the 
global optimum. On the other hand, if too much exploration but too little exploitation, it may be 
difficult for the system to converge and thus slows down the overall search performance. The proper 
balance itself is an optimization problem, and one of the main tasks of designing new algorithms is 
to find a certain balance concerning this optimality and/or tradeoff. Furthermore, just exploitation 
and exploration are not enough. During the search, we have to use a proper mechanism or criterion 
to select the best solutions. The most common criterion is to use the survival of the fittest, that is to 
keep updating the the current best found so far. In addition, certain elitism is often used, and this 
is to ensure the best or fittest solutions are not lost, and should be passed onto the next generations. 

5 Randomization Techniques 

As discussed earlier, an important component in swarm intelligence and modern mctaheuristics is 
randomization, which enables an algorithm to have the ability to jump out of any local optimum 
so as to search globally. Randomization can also be used for local search around the current best 
if steps are limited to a local region. Fine-tuning the randomness and balance of local search and 
global search is crucially important in controlling the performance of any metaheuristic algorithm. 

Randomization techniques can be a very simple method using uniform distributions, or more 
complex methods as those used in Monte Carlo simulations I^^^ . They can also be more elaborate, 
from Brownian random walks to Levy flights. 

5.1 Random Walks 

A random walk is a random process which consists of taking a series of consecutive random steps. 
Mathematically speaking, let Sn denotes the sum of each consecutive random step X,, then Sn 
forms a random walk 

N 

5]v = = Xi + ... + Xn, (30) 

i=l 

where Xi is a random step drawn from a random distribution. This relationship can also be written 
as a recursive formula 

N-l 

Sn = +Xn = Sn-1 + Xn, (31) 

which means the next state Sn will only depend the current existing state ^jv-i and the motion 
or transition Xn from the existing state to the next state. This is typically the main property of a 
Markov chain to be introduced later. Here the step size or length in a random walk can be flxed or 
varying. Random walks have many applications in physics, economics, statistics, computer sciences, 
environmental science and engineering. 

In theory, as the number of steps N increases, the central limit theorem implies that the random 
walk (31) should approaches a Gaussian distribution. In addition, there is no reason why each step 



length should be fixed. In fact, the step size can also vary according to a known distribution. If the 
step length obeys the Gaussian distribution, the random walk becomes the Brownian motion '^^^ . 

As the mean of particle locations is obviously zero, their variance will increase linearly with t. 
In general, in the rf-dimensional space, the variance of Brownian random walks can be written as 

a'^{t) = \vo\h'^ + i2dD)t, (32) 

where vq is the drift velocity of the system. Here D = s^/(2r) is the effective diffusion coefficient 
which is related to the step length s over a short time interval r during each jump. 

Therefore, the Brownian motion B{t) essentially obeys a Gaussian distribution with zero mean 
and time-dependent variance. That is, B{t) ~ A''(0, cr^(t)) where ^ means the random variance obeys 
the distribution on the right-hand side; that is, samples should be drawn from the distribution. A 
diffusion process can be viewed as a series of Brownian motion, and the motion obeys the Gaussian 
distribution. For this reason, standard diffusion is often referred to as the Gaussian diffusion. If the 
motion at each step is not Gaussian, then the diffusion is called non-Ga\issian diffusion. If the step 
length obeys other distribution, we have to deal with more generalized random walk. A very special 
case is when the step length obeys the Levy distribution, such a random walk is called Levy flight 
or Levy walk. 



5.2 Levy Distribution and Levy Flights 

In nature, animals search for food in a random or quasi-random manner. In general, the foraging 
path of an animal is effectively a random walk because the next move is based on the current 
location/state and the transition probability to the next location. Which direction it chooses depends 
implicitly on a probability which can be modelled mathematically. For example, various studies 
have shown that the flight behaviour of many animals and insects has demonstrated the typical 
characteristics of Levy flights I^^' ^^1. For example, a recent study by Reynolds and Frye shows 

that fruit flies or Drosophila melanogaster, explore their landscape using a series of straight flight 
paths punctuated by a sudden 90° turn, leading to a Levy-flight-style intermittent scale-free search 
pattern l^^l . Subsequently, such behaviour has been applied to optimization and optimal search, and 
preliminary results show its promising capability 1^°' . 

Broadly speaking. Levy flights are a random walk whose step length is drawn from the Levy 
distribution, often in terms of a simple power-law formula L{s) ~ |s|~^~'' where < /3 < 2 is an 
index. Mathematically speaking, a simple version of Levy distribution can be deflned as 

f V^^M-2{i^]-(Iz^, 0</x<s<oo 
L{s,^,IJ,)=l (33) 
l_ otherwise, 

where > is a minimum step and 7 is a scale parameter. Clearly, as s ^ 00, we have 

i^(.,7,M)«/^^. (34) 

This is a special case of the generalized Levy distribution ^'^I . 

In general, Levy distribution should be deflned in terms of Fourier transform 

F{k) = exp[-a\kf], 0<P<2, (35) 

where a is a scale parameter. The inverse of this integral is not easy, as it does not have analytical 
form, except for a few special cases. 
For the case of ^ = 2, we have 

F{k) = exp[-ak% (36) 

whose inverse Fourier transform corresponds to a Gaussian distribution. Another special case is 
(3=1, and we have 

F(A;) = exp[-a|fc|], (37) 
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Figure 2: Steps drawn from a Levy distribution (/3 = 1) using Mantegna's algorithm. 



which corresponds to a Cauchy distribution 

1 7 

P{x,'y,l^) = 2-T7 ^2 ' (38) 

where /i is the location parameter, while 7 controls the scale of this distribution. 
For the general case, the inverse integral 

1 r°° 

L{s) = - cos(fcs)exp[-a|fc|^](iA;, (39) 
Jo 



can be estimated only when s is large. We have 

a P r(/3) sin(7r^/2) 
nls 



L(5)^ -^-7,;;j^^"^ s^oo. (40) 



Levy flights are more efficient than Brownian random walks in exploring unknown, large-scale 
search space. There are many reasons to explain this efficiency, and one of them is due to the fact 
that the variance of Levy flights 

a^{t) ~ t^-^, 1 < /3 < 2, (41) 

increases much faster than the linear relationship (i.e., <T^{t) ~ t) of Brownian random walks. 

From the implementation point of view, the generation of random numbers with Levy flights 
consists of two steps: the choice of a random direction and the generation of steps which obey the 
chosen Levy distribution. The generation of a direction should be drawn from a uniform distribution, 
while the generation of steps is quite tricky. There arc a few ways of achieving this, but one of 
the most efficient and yet straightforward ways is to use the so-called Mantegna algorithm for a 
symmetric Levy stable distribution I^^' . Here 'symmetric' means that the steps can be positive 
and negative. 

A random variable U and its probability distribution can be called stable if a linear combination 
of its two identical copies (or Ui and U2) obeys the same distribution. That is, alli + 6C/2 has 
the same distribution a.s cU + d where a,b > and c, € 3?. If d = 0, it is called strictly stable. 
Gaussian, Cauchy and Levy distributions are all stable distributions. 

In Mantegna's algorithm , the step length s can be calculated by 

where u and v are drawn from normal distributions. That is 

u^N{0,al), vr^N{0,al), (43) 

where 

_ ( r(l+/3)sin(^/?/2) ^1//^ _ 

lr[(l + /3)/2]/3 2(/3-i)/2i ' ^ > 




Figure 3: Schematic representation of 2D Levy flights. 



This distribution (for s) obeys the expected Levy distribution for |s| > |so| where sq is the smallest 
step. In principle, |so| 3> 0, but in reality ,so can be taken as a sensible value such as sq = 0.1 to 1. 

Fig. 2 shows the 250 steps drawn from a Levy distribution with /3 = 1, while Fig. 3 shows 
the path of Levy flights of 250 steps starting from (0,0). It is worth pointing out that a power- 
law distribution is often linked to some scalc-frec characteristics, and Levy flights can thus show 
self-similarity and fractal behavior in the flight patterns. 

Studies show that Levy flights can maximize the efficiency of resource searches in uncertain 
environments. In fact, Levy flights have been observed among foraging patterns of albatrosses and 
fruit flies, and spider monkeys I^**' In addition. Levy flights have many applications. 

Many physical phenomena such as the diffusion of fluorenscent molecules, cooling behavior and 
noise could show Levy-flight characteristics under the right conditions P''! . 



5.3 Step Size in Random Walks 

As random walks are widely used for randomization and local search, a proper step size is very 
important. In the generic equation 

x'+^ =x' + s et, (45) 

et is drawn from a standard normal distribution with zero mean and unity standard deviation. Here 
the step size s determines how far a random walker (e.g., an agent or particle in metaheuristics) can 
go for a fixed number of iterations. 

If s is too large, then the new solution a;*"*"-"^ generated will be too far away from the old solution 
(or more often the current best). Then, such a move is unlikely to be accepted. If s is too small, 
the change is too small to be significant, and consequently such search is not efficient. So a proper 
step size is important to maintain the search as efficient as possible. 

From the theory of simple isotropic random walks, we know that the average distance r traveled 
in the d-dimension space is 

= 2dDt, (46) 

where D = ,s^/2r is the effective diffusion coefficient. Here s is the stc^p size or distance traveled at 
each jump, and t is the time taken for each jump. The above equation implies that 

For a typical length scale L of a dimension of interest, the local search is typically limited in a 

region of L/IO. That is, r — L/10. As the iterations are discrete, we can take t = 1. Typically 
in metaheuristics, we can expect that the number of generations is usually t = 100 to 1000, which 
means that 

r L/10 



Vtd Vtd' 



(48) 



For d = 1 and t = 100, wc have s = O.OIL, while s = G.OOIL for d ~ 10 and t = 1000. As step sizes 
could differ from variable to variable, a step size ratio s/L \s more generic. Therefore, we can use 
s/L = 0.001 to 0.01 for most problems. 

5.4 Swarm Intelligence and Mcirkov Chains 

From the discussion and analysis of metaheuristic algorithms, we know that there is no mathematical 
framework in general to provide insights into the working mechanisms, the stability and convergence 
of a give algorithm. Despite the increasing popularity of metaheuristic, mathematical analysis 
remains fragmental, and many open problems need urgent attention. 

From the statistical point of view, most metaheuristic algorithms can be viewed in the framework 
of Markov chains I^^' ^^1. For example, simulated annealing I^^l is a Markov chain, as the next state 
or new solution in SA only depends on the current state/solution and the transition probability. For 
a given Markov chain with certain crgodicity, a stability probability distribution and convergence 
can be achieved. In fact, it has been proved that SA will always converge if the system is cooled 
down slowly enough and the anncalling process runs long enough. 

Now if look at the PSO closely using the framework of Markov chain Monte Carlo f^^' 
each particle in PSO essentially forms a Markov chain, though this Markov chain is biased towards 
to the CTirrent best, as the transition probability often leads to the acceptance of the move towards 
the current global best. In addition, the multiple Markov chains are interacting in terms of partly 
deterministic attraction movement. Therefore, the; mathematical analysis concerning of the rate of 
convergence of PSO is very difBciilt, if not impossible. Similarly, other swarm-based motaheuristics 
such as firefly algorithms and cuckoo search can also be viewed as interacting Markov chains. There 
is no doubt that any theoretical advance in understanding multiple; interacting Markov chains will 
gain tremendous insight in understanding how the swarm intelligence behaves and may consequently 
lead to the design of better or new metaheuristics. 

6 Discussions 

Swarm-based metaheuristics are becoming increasingly popular, though theoretical framework and 
analysis still lack behind. Many important problems still remain unsolved. Firstly, there is no 
general convergence analysis for all metaheuristics, apart from some fragmental and yet important 
work concerning PSO and SA. Secondly, the choice of algorithm-dependent parameters such as 
population size and probability is currently based on experience and parametric studies. Thirdly, 
stopping criteria are also a bit arbitrary, though fixed accuracy or maximum number of iterations 
are widely used. 

In addition, there is no agreed measure to compare the performance of any two algorithms. 
People either compare the number of function evaluations for a given objective value or compare the 
objective values for a given number of functional evaluations. These results are often affected by 
algorithm-dependent parameters or way of actual implementation. As randomness is an integrated 
part of metaheuristics, statistical measures should be used. A general, statistical framework for 
performance comparison and algorithm evaluations is highly needed. 

Furthermore, no-free-lunch theorems 1^^' may have some important implications in metaheuris- 
tics, however, the basic assiimptions of these theorems may be questionable. Besides, these theorems 
are valid for single objective optimization, and they remain open questions for multiobjective. 

Finally, the current understanding of swarm intelligence is still very limited. How multiagents 
interact with simple rules can lead to complex behaviour, self-organization and eventually intelligence 
still remains a major mystery. Any insight with theoretical basis may have important implications 
to many areas, not only metaheuristics and optimization, but also neural networks and machine 
learning. Whatever the future developments will be, there is no doubt that swarm-based algorithms 
and swarm intelligence will play an ever more important role in science and engineering. 
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